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Abstract. We study the program complexity of datalog on both finite and infinite lin- 
ear orders. Our main result states that on all linear orders with at least two elements, 
the nonemptiness problem for datalog is EXPTIME-complete. While containment of the 
nonemptiness problem in EXPTIME is known for finite linear orders and actually for ar- 
bitrary finite structures, it is not obvious for infinite linear orders. It sharply contrasts 
the situation on other infinite structures; for example, the datalog nonemptiness problem 
on an infinite successor structure is undecidable. We extend our upper bound results to 
infinite linear orders with constants. 



Datalog is the language of logic programming without function symbols. Datalog has 
been extensively studied in database theory (see, e.g., [21 El (3 [Til LUl El LEI E]). In 
particular, the complexity of evaluating datalog queries has been determined: The data 
complexity is PTIME-complete, and the program complexity (also known as expression 
complexity) and combined complexity are both EXPTIME-complete [8[ I23j. 

While previous work on datalog was concerned with datalog over finite structures, in 
this paper we are mainly interested in infinite structures. Infinite structures occur natu- 
rally in spatial and temporal reasoning (and spatial and temporal databases). In temporal 
reasoning, time is usually modelled as an infinite linear order, sometimes discrete and some- 
times dense. This motivates our study of datalog on infinite linear orders. Let us remark 
that our results also apply to an interval based temporal reasoning, carried out, for example 
in Allen's interval algebra [3] (see Sec. [3]). 

When studying the complexity of datalog on infinite structures, we consider the struc- 
ture as fixed, that is, we are interested in program complexity. Note that the result of a 
datalog query on an infinite structure may be infinite, thus we cannot hope to compute 
the full query result in finite time. A reasonable version of the query evaluation problem 

1998 ACM Subject Classification: F.4.1, D.3.2, H.2.3. 
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As an application, we show that the datalog nonemptiness problem on Allen's interval 
algebra is EXPTIME-complete. 



1. Introduction 
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that avoids this problem is the datalog tuple problem, which asks whether a given tuple of 
elements is in the result. However, even for the tuple problem there is the technical issue 
of how to represent the elements of the tuple and how to represent the structure itself. 
The simpler datalog nonemptiness problem asks if the result of a query is empty. It is well 
known that on finite structures, the nonemptiness problem is in EXPTIME, and as long 
as the structures have two elements that can be distinguished by a datalog program, it is 
EXPTIME-complete (see Sec. It is easy to see that there are infinite structures where 
the nonemptiness problem is undecidable. An example is the structure with one infinite 
successor relation (see Seed]). 

Our first main result (Theorem 16. 4p states that on all linear orders (finite or infinite), 
the datalog nonemptiness problem is decidable in EXPTIME; for all linear orders with at 
least two elements it is EXPTIME-complete. 

A problem that has received considerable attention in the datalog literature is the 
boundedness problem (see, e.g., [Tl 111] [15]). A datalog program II is uniformly bounded on 
a class C of structures if there is a number b = 6(11, C) such that for all structures A in 
C, the computation of II on A reaches a fixed point in at most b steps. The boundedness 
problem asks if a given program is uniformly bounded on the class of all finite structures; 
it was shown to be undecidable in [llj . 

Our second main result (Theorem I7.2H states that every datalog program is uniformly 
bounded on the class of all linear orders. This also leads to the decidability of the dat- 
alog tuple problem on linear orders, provided the linear order satisfies certain effectivity 
conditions. 

Technically, both results are based on an analysis of the distance types of tuples pro- 
cessed in the evaluation of a datalog program. Types are a tool from model theory; the type 
of a tuple of elements records "definable" information about this tuple. Our distance types 
record information about the relative order of and the pairwise distances between elements 
of a tuple. The crucial technical fact underlying the results is that the whole computation of 
a datalog program on a linear order can be described in terms of a finite number of distance 
types that is bounded in terms of the program (independently of the structure). 

In the last section of this paper, we show how to incorporate constants into the distance 
type concept to transfer our results to datalog over linear orderings with a finite number of 
constants, which may occur in the datalog programs in question. 

As related results, let us mention recent results on the complexity of constraint sat- 
isfaction problems on infinite structures [H [19] . With some handwaving, the datalog 
nonemptiness problem may be viewed as a "recursive version" of constraint satisfaction 
problems Q 

2. Preliminaries 

2.1. Datalog. An atom is an expression of the form P(x\, . . . , Xk), where P is /c-ary relation 
symbol and x%, . . . ,xi- are variables. We admit 0-ary relation symbols!^ In the following, 

^Our actual starting point was an attempt to understand the complexity of constraint logic programming, 
which combines logic programming and constraint satisfaction. It turned out, however, that this complexity 
is dominated by the complexity of the "logic programming" part, which then led to our interest in datalog. 

2 A 0-ary relation either is empty, or it consists of the empty tuple (). 



THE COMPLEXITY OF DATALOG ON LINEAR ORDERS 



3 



we abbreviate tuples (x±, . . . , xt) by x. A datalog rule is an expression p of the form 

Px <- Qiyi, . .. ,Q 

where Px, Qiyi, ... , Q m y m are atoms. The tuples of variables x, y~\, . . . , y m need not be 
disjoint, and variables may be occur several times in each tuple. Furthermore, the variables 
in x are not required to be among those in yi,... ,y m - The atom Px is the head of the 
rule and Qiyi, ■ ■ ■ , Q m y m is the body. A datalog program is a set of datalog rules. Relation 
symbols occurring in the head of a rule of a datalog program II are called intensional relation 
symbols or IDBs; all other relation symbols are called extensional relation symbols or EDBs. 

Datalog programs are interpreted over relational structures. A vocabulary is a finite set 
r of relation symbols, each with a fixed arity. A structure A of vocabulary r consists of a 
(finite or infinite) set A and a /c-ary relation i?" 4 for every fc-ary relation R £ r. We say 
that a datalog program II is over a structure A if the vocabulary of A contains all EDBs 
of II and none of the IDBs. II is a datalog program over a class C of structures if II is a 
program over all ieC. 

Let II be a Datalog program over a structure A. The computation ofU over A is carried 
out in stages, in which the interpretation of the IDBs is computed; the interpretation of the 
EDBs is given by A and remains fixed. Initially, all IDBs are interpreted by the empty set. 
In each stage, a rule p of II is applied, and some tuples of elements of A are added to the 
interpretation of the IDB occurring in in the head of rule p. Formally, for every fc-ary IDB R 
we define a sequence (R^' A )i>o of fc-ary relations on the universe A of A. We let R^' A = 
for all IDBs R. Suppose now we have defined R^Z A for all IDBs R. In stage i, we choose 
a rule p, say, Px <— Qiyi, ■ ■ ■ , Q m y m - An instantiation of p at stage i consists of tuples 
a, b\, . . . , b m of elements of A matching the lengths of the variable tuples x, £/i, . . . , y m , such 
that 

• If two variables of the rule are equal, then the corresponding elements of the tuples are 
equal as well. For example, if x r = y s t then a r = b st . 

• For 1 < r < m: If Q r is an EDB, then b r E Qf. If Q r is an IDB, then b r G Q^_ iy 

We let 

p u,A = p u,A u |- | there ex j s t tuples bi,...,b m such that 
a,bi,...,b m is an instantiation of rule p at 
stage i}. 

For all IDBs R ^ P, we let Rf> A = Rfl A To turn this into a well-defined deterministic 
process, we cycle through the rules p of II in some fixed order. It can be shown that the 
result of the computation does not depend on this order. (It will be convenient later to 
apply only one rule at each stage, that is why we set up the computation this way.) 

Note that for all IDBs R and for all i > we have Rf' A C R^_ A . The process either 

reaches a fixed point after finitely many stages, that is, there is an iq such that RY ( ' A = 
for all i > io, or it continues forever (recall that A may be infinite). In both cases, we let 
-R00" 4 = Ui>o Then the R^ A form a fixed point of the computation, that is, further 

applications of the rules do not increase the relations. This is obvious if a fixed-point is 
reached in finitely many stages, but also easy to see if not. The result of the computation 
is the interpretation of the IDBs in this fixed point. 

We usually write R^ and R^ instead of i?* 1 '" 4 and R^, A if A is clear from the context. 
For an easier reference, we define the following set of parameters for a datalog program II: 
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By mi we denote the maximal IDB arity (i.e. variables on the left hand side, head part 
of program rules), by rriR the maximal number of different variables occurring in a rule. 
By tir we denote the number of rules of II and by nj the number of IDB symbols, by mi 
the maximal number of IDB occurrences in a rule body. All these parameters are bounded 
from above by the length n := \Yi\ of IT in some standard encoding. 

For a more detailed introduction to datalog, we refer the reader to [2]. 

2.2. Linear orders. A linearly ordered set is a structure A = (A, <^) of vocabulary {<}, 
where the binary relation <" 4 is a linear order of the universe A. For brevity, we refer to 
linearly ordered sets just as linear orders. Moreover, we usually omit the superscript in <- 4 
and use the symbol < to denote both the relation <" 4 and the relation symbol <. We write 
a < b instead of (a < b or a = b). The distance d(a, b) between two elements a < b G A is the 
maximum d > such that there are elements Co , . . . , Cd £ A with a = cq < c\ < ■ ■ ■ < Cd = b 
if this maximum exists, and oo otherwise. The linear order (A, <) is dense without endpoints, 
if for all a E A there are b,c 6 A such that b < a < c, and for all a, b £ A with a < b there 
is a c E A such that a < c < b. 

We consider linear orders in the strict sense, that is, a linear order is always antireflexive. 
For orders in the sense of "less-than-or-equal-to" , the datalog nonemptiness problem is 
trivialjl because we can always satisfy all atoms by interpreting all variables by the same 
element of the universe. 



2.3. Algorithmic problems. We shall study the complexity of the following two decision 
problems for fixed structures A: 

Datalog nonemptiness problem over A 

Instance: Datalog program IT over A, IDB P of IT. 
Question: Is P^' A ^ 0? 

Datalog tuple problem over A 

Instance: Datalog program IT over A, fc-ary IDB P of IT, /c-tuple a of 
elements of A (for some k > 1). 

Question: Is a G -P^" 4 ? 
For an infinite structure (with finite vocabulary and finite EDB and IDB arities, but infinite 
universe) , the tuple problem bears some difficulties with regards to the representation of the 
input tuple and the accessibility of the structure. To deal with the first difficulty, whenever 
we consider the tuple problem we assume that the universe of the structure A is a decidable 
set of strings over some finite alphabet. Furthermore, for linear orders A = (A, <-^) we 
assume that it is decidable whether for elements a, b € A and a nonnegative integer k there 
exist ai,... ,Ofc S A with a <" 4 a\ <" 4 • • • b. Note that if this is undecidable, 

then the datalog tuple problem over A is also undecidable. Thus our assumption is just a 
restriction to the interesting cases of the problem. 

Let us emphasise that these effectivity assumptions on the representation are only 
required when we study the datalog tuple problem. For the nonemptiness problem, we do 
not need to make any assumptions on the representation or decidability of A whatsoever. 

^Actually, the problem is still PTIME-complete; it is equivalent to the datalog nonemptiness problem 
over a structure with one element, which is equivalent to the satisfiability problem for propositional Horn 
clauses. 
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3. Datalog on Allen's interval algebra 

Allen's interval algebra, introduced in [3], is an algebra of relations over open intervals 
on the real line. These interval relations are built as unions from the 13 basic relations 
describing the pairwise relative end points of two intervals (x~,x + ) and (y~,y + ) as Table 
[1] (taken from [H]). The algebra of these 2 13 relations is equipped with the operations 
converse (denoted by intersection n and composition o. 

The complexity of constraint satisfaction problems over Allen's interval algebra and 
variants has been extensively studied (see, e.g., [191 [201 [21] ) . Constraint satisfaction prob- 
lems may be viewed as datalog nonemptiness problems for programs with a single non- 
recursive rule. Here, we are interested in the complexity of full datalog over the interval 
algebra. 

Table 1: The 13 basic relations of Allen's interval algebra. The obvious inequalities x~ < x + 
and y~ < y + of each case have been omitted. 



Basic relation 


Converse relation 


Example 


Endpoints 


x precedes y p 


y preceded by x p _1 


XXX 

yyy 


x + < y~ 


x meets y m 


y met by x m 1 


xxxx 

yyyy 


x+ = y~ 


x overlaps y o 


y overlapped by a; o 


xxxx 

yyyy 


x~ < y~ < x + < y + 


x during y d 


y includes x d _1 


XXX 

yyyyyyy 


y~ < x~ , x + < y + 


x starts y s 


y started by x s _1 


XXX 

yyyyyyyy 


x~ = y~, x + < y + 


x finishes y f 


y finished by x f 


yyyyyyyy 


y~ < x~, x+ = y+ 


x equals y = 




yyyyy 


x~ = y~, x+ = y+ 



Let X denote the structure whose universe consists of all open intervals on the real 
line, and whose relations are the relations of the interval algebra. We observe that datalog 
programs over X can easily be translated into programs over the linear order (R, <) and 
vice versa: 

Lemma 3.1. The datalog nonemptiness problem over X is LOGSPACE-equivalent to the 
datalog nonemptiness problem over (R, <). 

Proof. The reduction from the nonemptiness problem over X to the one over (R, <) is 
straightforward by replacing the interval variables by endpoint variables. Since we do not 
allow any equality relation to be used, we simulate equality by identifying variables. 

For the other direction, we transform the program LT over (R, <) to II' by replacing all 
atoms x < y by p(x, y). Then IT' is satisfiable if and only if II is satisfiable: If p(x, y) holds, 
then x~ < y~ is satisfied. If on the other hand x < y holds, then there are elements x + 
and y + such that p((x,x + ), (y,y + )) is satisfied, because the order (R, <) is dense. 

Both reductions can clearly be carried out in logarithmic space. □ 

4. Lower Bounds 

The hardness results in this section are either from [8], or they can fairly easily be 
proved by the techniques used in [8]. It is easy to see that for every finite structure A, 
the datalog nonemptiness problem over A is in EXPTIME. For every structure A whose 
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universe contains at most one element, the nonemptiness problem is in PTIME. Conversely, 
for every structure A, the datalog nonemptiness problem over A is PTIME-hard, because 
the satisfiability problem for propositional Horn clauses is equivalent to the nonemptiness 
problem for datalog programs with only 0-ary relation symbols. As soon as a structure 
contains two distinguishable elements, the nonemptiness problem becomes EXPTIME-hard. 
This will be made precise in Lemma [4.11 below. For the reader's convenience, we sketch the 
proof. It requires some preparation. 

A successor structure is a structure B = {B, S , N ), where B is either finite or count- 
ably infinite, and for some enumeration bo,b\,... of B, the binary relation S B consists of 
all pairs (6j, and the unary relation N B only contains the element 60 • 

Assume, that in some structure A with universe A, we can define a successor structure. 
This means that there exists a datalog program LT with an m-ary IDB U, a 2m-ary IDB S, 
and an m-ary IDB N such that the structure B = (B, S B , N B ) with B = U^' A , S B = S^ A , 
and N B = N£ A is a successor structure. Then a given Turing machine transition function 
can be translated to a datalog program defining the following IDB relations: 

symbol cr (x, y): In step x of the computation the tape cell y contains the symbol a. 

cursor(x, y): At instant x the cursor points to cell y. 

state s (x): In step x the Turing machine is in state s. 

accept: The computation has reached an accepting state. 

Here x and y range over elements of the defined successor structure B and hence can be 
viewed as encoding natural numbers, which are used to address time steps and tape cells. We 
may define auxiliary IDB relations ensuring the consistency of the simulation and encoding 
the input on the tape of the machine. Then we have a program, computable in logarithmic 
space from the machine encoding, whose IDB accept is derivable (and hence nonempty) if 
and only if a machine run accepts in a number of steps bounded by the size of the successor 
structure. 

Lemma 4.1. Given a structure A such that two relations Uo,U\ C A k , k £ N, can be 
defined by a datalog program on A, such that 

u nu 1 = 0, Uq + ^Ux + 0. 

Then the datalog nonemptiness problem over A is EXPTIME-hard. 

Proof. Without loss of generality we may assume that A actually contains two k-aiy rela- 
tions Uq,U\ which are nonempty and disjoint. Hence we can use these relations as EDB 
predicates in a datalog program. We prove that any deterministic Turing machine compu- 
tation on input x, with \x\ = n and time bound t{x) = 2 m {m = m{n) being a function 
with variable n) can be simulated by a datalog program with IDBs having at most 2 • k ■ m 
free variables. 

The elements in Uq are used as and the elements in U\ as 1 to build a successor 
structure of binary vectors of arity m, leading to a successor substructure with values in 
[0..2 m ]. The details can be found in [8] with slight modifications. 

The maximal arity of any IDB relations involved is 2 • k ■ m, defining the successor 
between two m-tuples of entries that have arity k. 

By the construction of the Turing machine simulation, any machine computation run- 
ning at most 2 m steps can be simulated using datalog programs with maximal arity 2-k-m, 
which concludes the proof. □ 
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Corollary 4.2. For every linear order A = (A, <) with at least two elements, the datalog 
nonemptiness problem over A is EXPTIME-hard. 

Proof. Let Uq be the binary relation x < y and U\ the converse x > y. □ 

Note that over infinite structures, the datalog nonemptiness problem can easily become 
undecidable. One of the simplest examples of an infinite structure where this happens is an 
infinite successor structure: 

Proposition 4.3. Let B be an infinite successor structure. Then the datalog nonemptiness 
problem over B is undecidable. □ 

The proof is another straightforward Turing machine simulation. 

5. Distance types 

Types are a model theoretic tool that we shall use for dealing with datalog programs 
on infinite orders. We define an appropriate notion of type and prove a lemma that links 
them with the evaluation of datalog programs. 

Definition 5.1. 

(1) A distance atom is an expression of the form x <d y, — oo <d x, or x <d oo, where x, y 
are variables and d is a nonnegative integer. We may write <d instead of <d for d > 0. 
A distance type is a finite set of distance atoms. 

We write 6(x\, . . . , Xk) to indicate that the variables of the distance type 5 are among 
x±, . . . ,Xk- The set of all distance types with variables among x\, . . . , Xk is denoted by 
A(xi, . . . ,x k ). 

(2) Let A = (A, <) be a linear order, a = (oi, . . . , ak) £ A k , and let 5 = S(x\, . . . , Xk) be a 
distance type. Then (A, a) satisfies 5 (we write: A \= <5(a))Hif 

— for all atoms Xi <d Xj £ 5, there are bo, . . . , bd £ A such that a- L < bo < b\ < . . . < 
bd < dj (that is, Xi <d Xj is interpreted as Xi < xj and d(xi,Xj) > d); 

— for all atoms — oo <d Xj £ 5, there are bo, . . . , bd £ A such that bo < b\ < . . . < bd < 

aj ; 

— for all atoms X{ <d oo £ 6, there are bo, ■ ■ ■ , bd £ A such that ai < bo < b\ < . . . < bd- 
A distance type 5 is satisfiable if there is a linear order A and a tuple a such that (^4, a) 
satisfies 5. 

(3) The rank of a distance atom t <d u is d, and the rank of a distance type 5 is the maxi- 
mum of the ranks of all atoms it contains. The set of all distance types in A(xi, . . . , x^) 
of rank at most d is denoted by Ad(xi, . . . , Xk). 

(4) Let A = {A, <) be a linear order, a = (a\, . . . , ak) £ A k , and d > 0. The distance-d type 
of a in A, denoted by tp d (^4, a), is the distance type that contains: 

• for 1 < i,j < k with < aj the distance atom X{ < c Xj, where c = min{d, d(ai, aj)}; 

• for 1 < j < k the distance atom — oo < c Xj, where c < d is maximum such that there 
exists bo, ■ ■ ■ , b c £ A with bo < ■ ■ ■ < b c < aj] 

• for 1 < i < k the distance atom Xi < c oo, where c < d is maximum such that there 
exists bo, ■ ■ ■ , b c £ A with aj < bo < ■ ■ ■ < b c . 



'Another common terminology is to say that a type is "realised" instead of "satisfied". 
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(5) A distance type 5 is complete if there exists a linear order A, a tuple a with A \= 5(a), 
and d > such that for each pair (cij , a,j ) of entries of a satisfying ai < aj there is 
precisely one distance atom ai < c aj in 5 with < c < d, and for each pair (aj, aj) with 
aj = aj there are distance atoms a, <o aj and aj <o aj in <5. 

The set of all complete distance types with variables among x%, . . . ,x k is denoted by 
r(xi, . . . , x/~), and the set of all types in T(xi, . . . , x/~) of rank at most d is denoted by 

r d (xi, . . .,x k ). 

Example 5.2. An example for a distance type from A(x,y,z) is: 

5 = x < 3 y, y < 2 z 

This type 5 is satisfied for some elements 01,02,03 € A, which we assign to the variables 
x = a\ , y = d2 and z = 03 , if there exist 61 , 62 , ^3 £ -A with 

a\ < b\ < 62 < 02 to satisfy x <3 y 

^2 < 03 < 03 to satisfy y <2 z . 

The occurring ranks of the atoms in delta show 5 £ A3(x,y,z). 5 is not complete, since 
there is no distance atom containing x and z and no distance atom containing —00 or 00. 

Let us point out some subtleties of these definitions that may be confusing. A distance 
type need not be satisfiable, but a complete distance type must be satisfiableH Even 
though the "constants" —00 and 00 appear in distance atoms, they are not part of the 
datalog language, and we do not require linear orders to have a minimum or maximum. 
The semantics of the atoms —00 <d x, or x <d 00 is well-defined in all linear orders. 

Note that x <i y is equivalent to x < y and that x <o y A y <o x is equivalent to 
x = y. A distance type of rank 1 only contains information about the relative order of the 
variables and about equalities between the variables, and not about their distances. Hence 
we call distance types of rank 1 order types. 

It it is easy to see that it can be decided in polynomial time in the number of variables 
whether a distance type is satisfiable and whether it is complete. 

Definition 5.3. Let j,5 be distance types. Then 7 implies 5 if for all distance atoms 
x <d x' in 5 there is a distance atom x <^ x' with d < d! in 7. 

Lemma 5.4. 

(1) Let j(x),S(y) be distance types such that 7 implies 5. Then all variables in y also 
appear in x, and for every linear order A and every tuple a such that A \= 7(a), for the 
projection b of a to the variables in y we have A (= 6(b). 

(2) Let 5 € A^(x). Then for all linear orders A and all tuples a G A k , 

A \= 5(d) <^=^ there exists a 7 € r^(x) such that 7 implies 5 and A (= 7(a). (5.1) 

We omit the straightforward proof. Note that statement (2) of the lemma implies that 
every type can be written as a disjunction of complete types. 

Recall that for each IDB P of a datalog program II over some fixed structure A, by 
Pp we denote the interpretation of P after the ith stage of the computation of II. In the 
following lemma we will show how to describe the stages by finite sets of distance types, 
but first we will have a look at a simple example. 



'In model theory, it is common to define types as being satisfiable sets of formulas. 
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Example 5.5. Let II be the following two-rule program denning IDBs P and Q: 

P(x,y) <- x < zi, zi < z 2 , z 2 < y. 

Q(x,y,z) <- P(x,y), y <w, w < z. 

Applying the first rule to the empty IDB relations at the beginning, the resulting relation 
P^ contains all tuples satisfying the distance type x <3 y, since there have to be three 
distance atoms satisfied between the elements assigned to x and y. 

Applying the second rule to this stage 1, this distance type is copied to the type de- 
scribing the tuples in and on y and z the type y <2 z is imposed, leading to the following 
type describing Q^: 

5 = x < 3 y, y < 2 z 

For programs using recursion and more rules leading to some form of disjunction, a 
single distance type is not enough to describe a relation, but sets of types are needed. 

Lemma 5.6. Let A = (A, <) be an infinite linear order and LT a datalog program over A. 

Then for each k-ary IDB P of II and each i > there is a finite set <d(P,i) C 
T(x\, . . . , Xfc) such that for all a € A k it holds that 

a £ P/ 1 <=^ there is a 9 £ 0(P, i) such that A |= 9(a). 

Furthermore, the rank of all types in G(P, i) is bounded by (rnR) 1 , where m,R denotes the 
maximal number of variables in a rule of LT as usual. 

Proof. For i > 0, let di := (m^) 1 . 

We prove the claim by induction on i. The induction base for i = is obvious: We let 
9(P,0) = for all IDBs P. 

For the induction step (i — > i + 1), we consider the application of a rule 

p: P(x) P\y 1 ),...,P t (y t ),e{y), 

where P 1 , . . . , P^ are IDBs and e(y) is a list of EDB atoms with variables in y. We view e 
as a distance type of rank 1; this will enable us to unify some of the arguments below. Let 
Z = {zi, . . . , z m } be the set of all variables occurring in p, and let z = (zi, . . . , z m ). Note 
that m < rriR and hence m ■ di < dj+i- 

For 1 < j < £, let 9j G Q(P J ,i). Assume that there is a 7 € Td i (zi, . . . , z m ) which 
implies 9±, . . . , 9t and e. Suppose wi, . . . , w m is an enumeration of Z in the order imposed 
by 7, and let eo, . . . , e m > such that 7 contains the distance atoms 

-OO < eo Wi, W p < £p W p+1 for p= l,...,m- 1, t« m <e m OO. 

We define a new type 7^ in the variables x as follows: 

• Let 1 be a variable in x, say, x = w p . Then 7^ contains the distance atom —00 <^ x for 

• Let be variables in x, say, x = Wi and x' = Suppose that i < j. Then 7^ 
contains the distance atom x <d x' for d = Ylr=p e r- 

• Let x be a variable in x, say, x = tt; p . Then 7^ contains the distance atom x <^ 00 for 

It is easy to see that 7^ is a complete distance type in the variables x. The rank of 7^ is at 
most m times the rank of 7 and hence bounded by m - di < di + \. Therefore, 7^ € ^d i+1 (x). 
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Furthermore, it is easy to verify that every tuple a that satisfies j\ x has an extension to an 
m-tuple c that satisfies 7. 

We let 0(-P, i + 1) be the union of 6(P, i) with all types where 7 G T^. (z) such that 
7 implies e and there exist 6j G Q(P J , i), for 1 < j < ^, implied by 7. We claim that for all 
tuples a it holds that a G P^L if and only if there is a 9 G 0(P, i + 1) such that .4 (= 0(a). 

To prove the forward direction of this claim, let a G Pj+i- If a G P/ 1 , then by the 
induction hypothesis there is a 8 € 0(P, i) Q 0(P, i + such that A \= 6(a). Suppose that 
a G Pj\-i \ Pf 1 - Then there is a tuple c interpreting the variables z, with projections a on 
the coordinates of the variables in x, bj on the coordinates of the variables in y~j, and b on 
the coordinates of the variables in y, such that bj £ for 1 < j < £ and A \= e(b). By 

the induction hypothesis, for 1 < j < £ there is a type 9j G Q(Pj,i) such that 4, |= 6j{bj)- 
Let 7 be the complete distance-^ type of c. Then 7^ G @(P,i + 1), and *4 |= j\x(a). 

For the backward direction, suppose that a tuple a satisfies a type 7'(x) G Q(P,i + 1). 
If 7 ; (x) G @(P, i), then a G Pf 1 C by the induction hypothesis. Otherwise, there is 
a complete type 7 G r^. (z) and types 0j G @(Pi,i) for 1 < j < I such that 7' = 7| 2 . and 
7 implies 9%, . . . ,0£,e. Let c be an m-tuple satisfying 7 such that the projection of c on 
the coordinates of the variables in x is a. For 1 < j < £, let 6j be the projection of c on 
the the coordinates of the variables in yj. Then A \= 6j(bj). By the induction hypothesis, 
bj G (P J )P. Let 6 be the projection of c on the coordinates of the variables in y. Then 
A \= e{b). Putting everything together, we obtain a, G PR.i- D 

Remark 5.7. Note that actually we have proved a slightly stronger bound on the ranks of 
the types in Q(P,i): Letting d{ be the maximum rank of all types in Q(P,i), we have 

di+i <m R -di (5.2) 

for all i > 0. Furthermore, whereas the numbers d, may depend on the order in which we 
apply the rules of the program, the bound (|5,2p holds for all orders. 

The following example shows that the ranks of the types can increase during a compu- 
tation in a way that can get quite complicated: 

Example 5.8. Consider the following program consisting of rules p%, P2 and P3, with 
x = (x\, . . . , X5). We use the abbreviation Xj <2 Xj for x.- L < y,y < Xj omitting some body 
variable y in the definition of LT, which does not appear elsewhere in the rules. 

pi : Px <— Xi < 2 X2, X2 <2 X 3 , X4 <2 X 5 . 

p 2 : Px <- xi < x 2 , x 4 < z 2 , z 3 < 2/4, y 5 < x 5 , 

P(x 2 , x 3 , zx,z 2 , z 3 ), P(yi,y2, 2/3, 2/4, 3/5 )■ 

p 3 : <- P(xi, x 2 ,a;3, zi,z 2 ),P(yi,X4, x 5 ,y 2 , y$). 

The rule pi is an initialization rule which initializes all given types to <2- 

The rule P2 introduces x\ <\ X2, reuses existing types by copying and sums up some 
existing atoms from possibly different existing types. 

This rule uses two recursive occurrences of the IDB P, which in our description of the 
application of this rule by types leads to the use of two (possibly different) types from the 
type set describing earlier stages of P<^. We denote the ranks of the distance atoms from 
these two types occuring in our computation, using a = (a±, . . . , 05) for tuples from such 
stages, by: 
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Ranks in distance types of the earlier stages of 


first occurrence of P 


ai < C1 a 2 


a 4 < C2 as 


second occurrence of P 


a x < c / a 2 


a 4 < c > 9 a 5 



The rule application of p 2 using these distance atoms will then impose the following 
type on the body variables, where the distance atoms are given in the order of the variable 
appearance in the rule, ommiting the atoms containing the variables z\, y\, y 2 and y 3 , not 
part of the result: 

Xi <i x 2 , x 4 < z 2 , z 3 <i y 4 , 1/5 <i X 5 , X 2 < C1 X 3 , Z 2 <c 2 ^3, 2/4 < c ' 2 2/5 

To combine these types by eliminating non-head variables, we rearrange these atoms: 

x 1 <i x 2 , x 2 < C1 x 3 , x 4 <i z 2 , z 2 < C2 z 3 , 2:3 <i y 4 , 1/4 < c / 2 y 5 , y 5 <i x 5 

After the elimination of non-head variables, the following type is added to the type set 
of P: 

x 1 <i x 2 , a; 2 < C1 x 3 , x 4 < C2+c ' 2+ 3 x 5 
Rule p 3 copies some distance atoms for xi,X2,x 3 and transfers some x 2 < c x 3 to x 4 < c 
X5 in the result. We conclude the example with the shortest program run leading to a 
fixed point, described by the ranks of the types x\ < c ; 1 x 2 , x 2 <d 2 x 3 and x 4 <^ 3 X5. We 
assume, that always the smallest ranks are chosen. Longer runs could lead to even bigger 
intermediate results, but will have the same final result. 



step 


rule 


di 


d 2 


d 3 


remarks 


1 


Pi 


2 


2 


2 




2 


92 


1 


2 


7 




3 


92 


1 


1 


12 


using tuples in line 2 and 1 


4 


93 


1 


1 


1 


using tuple in line 3 twice 



6. Upper Bounds for datalog programs on orders 

Now that we have a formal description of the IDB relations in this case, we will use 
the concept of discrete order types to show an upper bound for the datalog nonemptiness 
problem. But before, we transform the program into some normal form which integrates the 
possible order types into the program by creating disjoint copies of each IDB, each having 
a different order type and hence leading to disjoint relations. 

Definition 6.1. A datalog program over linear orders A is type- disjoint, if for every /c-ary 
IDB P there is a complete order type 7p G Y\(x\, . . . ,x^) such that for all linear orders 
A = (A, <) and all tuples a G P^ it holds that tp 1 (A, a) = jp. 

The order type of an IDB P in a type-disjoint program II is the order type jp. 

Lemma 6.2. For every datalog program H over linear orders there is a type-disjoint datalog 
program H' over linear orders with the following properties: 

(1) For every IDB PofH there are IDBs Pi, ... , P np ofH' of pairwise distinct order types, 
such that for every linear order A, 

np 
3=1 
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(2) n'j < nj ■ 3 m L , n' R < 3 m i'( m / +1 ) • ur, m' R = mp and m' L = rriL, m'j = mi, where n'j, 

m' R , m' L , m'j are the parameters ofli'. 
Furthermore, the program TV can be computed from H in exponential time. 

2 

Proof. From each IDB P of arity r, we create np = 3 r distinct copies Pq, . . . , P np -i, each 
having a different order type. For i € {0, . . . ,np — 1}, let (io, • • • , M-i) De t ne ternary 
representation of the number i, i = X^o* h ' ^ with < % j < 3 for all j = 1, . . . , np — 1. 
Then we link to each new IDB Pi a distance- 1-type 7p s , which consists of the following 
distance atoms: 





x i2 > 


if 


l ji+32-r 


= 


Zji <i 




if 




= 1 


Xj 2 <i 




if 


l h+h-' r 


= 2 



So each combination of distance atoms for all pairs of variables will be present in some 
7P i . After computing these distance types, we transform the program in two stages. First, 
we change the head IDBs to the new IDB set consisting of the distinct copies created as 
above for each IDB of II: Each rule p with head atom Px, is replaced by copies p' , . . . , p' np _\ 
with head PjX with j € {0, . . . ,np — 1}, and the body copied from p and extended by EDBs 
Xj 1 < Xj 2 for each (xj x <i Xj 2 ) G 7p 3 - In case of (xj 1 = Xj 2 ) £ r )p j , we replace all occurrences 
of xj 2 by xj 1 afterwards. This simulates the equality relation, which is not available as EDB 
or IDB relation. 

Each of the rules p' , . . . ,p' np _ 1 is then itself replaced by copies which instead of the 
body IDBs from II, use the IDBs of IT: 

In each rule p r , say, P\x <— Q\y\, ■ ■ ■ , Q m ym, Vi, ■ ■ ■ , Vm), with e being a sequence 
of EDBs, each IDB Qj of II has been converted to a set of IDBs {Qje}- From these sets 
we generate all possible combinations (Qi^, ■ ■ ■ , Q m e m ) and create from each combination 
a rule of IT: 

P { x <- Qu 1 yi,---,Q m e m ym, ^(x,yi,...,y m ) , 

where the sequence E of EDB atoms is left untouched. 

After that, we directly eliminate a rule with an inconsistent order type. This can 
be done by viewing the rule as graph with the variables being the nodes and the order 
atoms being directed edges. A check for a directed cycle, which can be carried out in time 
polynomial in the rule length, shows if the order type is inconsistent. 

Each tuple added to a stage in the evaluation of II introduced by some rule p of II has 
a complete distance 1 type, so there will be one of the copies of p, which can be applied to 
add this tuple. Conversely, the newly created rules of IT may only add tuples, for which a 
rule in II exists adding this tuple. 

2 2 

Each IDB of arity r is converted to not more than 3 r copies and hence n\ <nj ■ 3 m £ . 
For each rule, we need all combinations of copies of the newly created IDBs, adding up to 

/ 9 \ (mj+l) 9 

at most n' R < (3 m L J • n R = 3™ L i m i+ l ) . nR . □ 

For type-disjoint datalog programs, the nonemptiness problem can be solved in a simple 
fashion, essentially disregarding any recursion in rules. In the following lemma, we construct 
an execution sequence s that will suffice to decide the datalog nonemptiness problem. 

Lemma 6.3. Let U be a type-disjoint datalog program over an infinite linear order A = 
(A, <). Then there exist an i s < nj and a sequence s = (po, pi, . . . , Pi s -i) of rules, such 
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that after applying pi to stage i for i = 0, . . . , i s — 1, the emptiness is determined, i.e. for 
all IDBs P it holds that 

Fg = P£ = • (6.1) 

Such a sequence s can be computed in time tir ■ nj. 

Proof. We create the sequence s by cycling through the rules nj times, adding those rules 
to s, which change an empty IDB to nonempty, formally: s = (po,pi, . . . , pi s -±) such that 
there exist IDBs Pi, . . . ,P is with (P)f = 0, and after applying p { to (P)p, {Pi)f +1 + for 
i = 0, . . . ,i s - 1. 

We continue this process until no more rules can be applied to make an empty IDB 
nonempty, but this can happen at most nj times, immediately leading to the time bound 
for the computation. Note that nonempty IDBs are never modified by s. 

The crucial observation is that, in a type-disjoint program, it only depends on the 
nonemptiness of the IDBs in the body if a rule adds new tuples to the head IDB, and not 
on the actual content of the body IDBs. This follows from the fact that at each stage 
the content of the IDBs is a union of disjoint complete types by Lemma 15.61 an d that the 
distance types are monotone by Corollary 17.31 Thus in an infinite order, we can always add 
all tuples of sufficiently large finite distances. 

We now show property (|6.ip by contradiction: 

Let U = { R | RY s = A P^ ^ } be the set of IDBs changing to nonempty after s 
and assume U ^ 0. Then for each R G U there exist an ir G N and a rule pr with: 

Rf R = 0, and applying p R to Rf R : Rf R+1 + . 

Let P E U be the IDB with ip = min{^ | R E U}. By the definition of U and by the 
choice of i, all Q E U\{P} have to satisfy QY R = 0- Since a rule can be applied if and only if 
all body IDBs are nonempty, the rule pr cannot depend on them and can be applied in stage 
i s leading to a sequence of rule applications making more IDBs nonempty, a contradiction 
to the construction of s. □ 

Theorem 6.4. The datalog nonemptiness problem over linear orders A = (A, <) is EXP- 
TIME- complete 

Proof. The proof is a combination of several earlier results. A datalog program LI can by 
Lemma f6.2l be converted to a type-disjoint program LI'. For this kind of program Lemma [6.3l 
gives us a method to check which IDB relations of LI' will be empty after an evaluation of 
LI'. Since IT is type-disjoint, each IDB relation of the original program LT will occur here as 
a collection of IDBs of IT', which can easily be determined. Thus, the question "P^ = 0?" 
can be answered by checking the type sets of all corresponding IDBs of II'. Beside the time 
for this check and the time for the conversion of the programs, the time for determining 
the empty IDB relations of LI' is part of the running time. Using Lemma 16.21 and 16.31 the 
time of this step can be bounded from above by 0(ur ■ 9 m £'( m/+1 )), altogether clearly in 
EXPTIME and with the earlier shown EXPTIME-hardness the claim follows. □ 

7. BOUNDEDNESS 

A datalog program II is bounded on a structure A if there is computation of LT on A, 
that reaches a fixed point after finitely many stages. Of course, this concept of boundedness 
is nontrivial only on infinite structures. The main result of this section is that datalog 
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programs are bounded on linear orders. Actually, we prove a stronger result giving a 
uniform bound on the number of evaluation steps that computable from the size of the 
program and does not depend on the structure. This stronger result is even meaningful for 
finite linear orders. 

Definition 7.1. Let II be a datalog program over a structure A. 

(1) A computation sequence for II is a sequence s of rules of II to compute all IDB relations, 
i.e. a sequence of rules satisfying the following conditions: 

• If s is finite, then after applying s, no further rule application adds a tuple to the 
IDB relations. 

• If s is infinite, then each rule of II will occur infinitely often. 

(2) The closure ordinal of II on A, denoted by cl(II, A), is the length of the shortest compu- 
tation sequence for II on A (cl(II, A) = oo, if all computation sequences are of infinite 
length) . 

(3) II is bounded on A if cl(II, A) < oo. 

Now let C be a class of structures such that II is a program over C. 

(4) The uniform closure ordinal of II on C, denoted by ucl(II, C), is the maximum of the 
closure ordinals cl(II, A) for A £ C, if this maximum exists, and oo otherwise. 

(5) II is uniformly bounded on C if ucl(IT, C) < oo. 

Note that if II is uniformly bounded on C, then it is bounded on all A £ C, but that 
the converse does not necessarily hold. 

Theorem 7.2. Let H be a datalog program over the class LO of linear orders. Then IT is 
uniformly bounded on LO. More precisely, there is a computable function b : N i— > N such 
that for n = \U\ it holds that 

ucl(n,XO) < b(n). 

Our proof of Theorem 17.21 is based on a simplification of the distance type concept 
which we will discuss before the presentation of the main proof. The proof presented here 
is an extension of the proof of Theorem 16.41 first transforming the program II in question 
to a type-disjoint version IT' by Lemma 16.21 and then creating the initialization sequence s 
as in Lemma 16.31 After this process, we may eliminate all then empty IDB relations. Each 
remaining IDB P may only contain tuples of one complete order type "dp. 

Let 7 £ T(xi, . . . , x n ) be a complete distance type. Observe that 7 is completely 
determined by its underlying order type and the distances d imposed by the distance atoms 
—00 <d x, x <d x', x <d 00. We can describe the distances by a tuple d 1 = (dj, . . . , cQ.) of 
length k = 2n + with nonnegative integer entries. We call d 7 the rank vector of 7. We 
define a partial order ^ on the complete distance types in T(x\, . . . ,x n ) by letting 7^7' 
if 7 and 7' imply the same order type and dj < dj for 1 < i < k. Observe that 7^7' 
if and only if 7' implies 7. The following corollary is hence an immediate consequence of 
Lemma 15.4( 1): 

Corollary 7.3. Let A = (A, <) be a linear order, a € A k , and 7,7' € r(xi, . . . such 
that 7^7'. Then 

AhV(a) A h 7(a). 

The crucial observation that we will exploit in the following is that the computation 
of a type-disjoint datalog program can be described entirely in terms of sequences of rank 
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vectors for the IDBs. This follows from Lemma 15.61 stating that the computation can be 
described in terms of complete types and the observation that for type-disjoint programs it 
suffices to consider the rank vectors, because the order types of the IDBs are fixed. 

After applying the sequence s and after eliminating empty IDBs, for each IDB P, the set 
@(P, i s ) is described by exactly one such vector, since \@(P,i s )\ = 1 after the initialization 
sequence which adds at most one type to the type set of each IDB. By Corollary 17.31 an 
tuples realizing a type j' with 7 H 7', also realize the weaker type 7. Hence increasing the 
size of an IDB relation P by adding new tuples (which realize a newly added type 7') is 
only possible if in all present types 7 € Q(P,i) some atom rank of 7 is greater than the 
corresponding rank in 7', i.e. 7 ^ 7'. 

In terms of rank vectors, a type 7 defines a set 7Y 7 containing the vectors of all types 
7' that are at least as restrictive as 7, i.e. 7^7': 



7i~, 



{Oi, 



xg > dj for I = 1, . . . , kp } 



Speaking of rank vectors, 7 ^ 7' if the rank vector (P is dominated by the rank vector 
cP' , i.e. for all % = 1, . . . , kp, dj < d] . Then is the set of all types with a rank vector 
dominating the rank vector of 7. 

Then creating a sequence of new types added to @(P,i) is equivalent to the search 
for a non-dominating sequence of rank vectors, where we call a (finite or infinite) sequence 
xi, X2, ■ ■ ■ non- dominating if for all i and j with i < j, Xj does not dominate Xi. 

Figure Q] shows a graphical representation for kp = 2. Figure Q] (c) shows a case where 
a new vector is added containing a coordinate greater than the maximum of all existing 
entries. But this growth can only occur in a limited manner, as we will show. Before, we 
introduce some notation. 



(a) 



(b) 



y///////// 



—\ 1 1 1 1 1 r~ 



y/////// 



—\ — 1 — 1 — 1 — 1 — 1 — i- 




Y//////// 



-1 1 1 1 1 1 r - 



Example of the description of an IDB relation with rank vectors of length 2 (x and y coordinate). 
Figure (a) shows a description with one rank vector, automatically including all types with rank 
vectors in the hatched area. Figure (b) shows the situation after a second rank vector was added, 
automatically including more types. In Figure (c), another vector is added. 



Figure 1: Geometric Representation of a Type Set 



Definition 7.4. Let k E N and x = (27, . . . , x^) 6 N k . Then | |x| |oo := max{xi, . . . , x^}. 
For S C N fc , let ||5||oo := max ie g ||^||oo- Let s\,...,si be finite sequences, each sequence 
consisting of tuples of some arity, and let C = (s%, . . . ,si) be a tuple of these sequences. 
Then ||C|| oo : — max^_j ||<Si||oo) where the sequences are considered as sets. 

To model the rank vectors occurring in the stages of the IDB relations, we introduce a 
corresponding sequence concept: 
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Definition 7.5 (c-Bounded Run). Let t G N, let k\, . . . , kt G N and for i = 1, . . . ,t let 

Xi G N ki . Let c G N. Then X is a c-bounded run of (x\, . . . , xt), if 

• S®, . . . , are sequences of tuples, where for each i, consists of the tuple Xi only. 

• The stage Xq of X is the tuple Xq = (s®, . . . , s° ). 

• Inductively, the j-th stage Xj = (s{,...,Sj) of X is created from the stage Xj-i by 
choosing an I € {1, ... ,t}, a /U,- G N, and {a?i, . . . , x^. } C N fc ^ such that 

— H < (HXoHoo-c^ 1 )^ 

— for n / £: s J n = 

— = s\ 1 o (xi, . . . , (o meaning sequence concatenation) 

— s\ is non-dominating 

— pi | loo < c • llZj-iHoo for all t = 1, . . . 

The condition on fij ensures that the sequence added in each stage is finite and bounded 
from above by some function of ||Xo||oo> c an d j, which will be needed for the computation 
of a uniform bound. The connection between the setting of datalog programs on orders and 
the c-bounded runs is given by the following lemma: 

Lemma 7.6. Let t be the number of nonempty IDB relations of the type-disjoint program 
IT after the initialization sequence s of length i s from Lemma \6. 31 Then for each nonempty 
IDB relation P, the set Q(P,i s ) contains exactly one rank vector. Let d\,...,dt be these 
rank vectors. Let m = max{m' R , m! l , m' L } . 

(1) For all j = l,...,t: H^IU < (m' R )^ < {m' R ) n >'i . 

(2) For each computation of II' continuing the initialization sequence, the rank vectors added 
during this computation form an m-bounded run X of (d\, . . . , dt). 

Proof. To prove (1), note that [|dj||oo < ( m /j) ?s follows from Lemma 15.61 and {m' R ) %8 < 
(m' R ) n 'R n 'i follows from Lemma 16.31 

(2) is proved by induction on the steps of the computation. Suppose at stage i of the 
computation of LT, a rule p with IDB P in its head is applied. Let 7J, . . . , 7'/ be the types 
in G(P, i + 1) \ &(P, i)- It may be that some of the 7^ dominate 7^ for j < i or 7 G G(P, i). 
We omit all these 7- and obtain a sequence 71, ... ,7^. Adding their rank vectors to the 
run obtained so far, we obtain a non-dominating sequence. The m-boundedness of the run 
follows from Remark 15.71 □ 

To show a computable uniform bound on c-bounded runs, we need two well known 
lemmas which we state here without proof: 

Lemma 7.7 (Konig's Tree Lemma (see, e.g., [13 [TO])). Let T be an infinite rooted 
directed tree with finite branching (i.e. each vertex has a finite number of children). Then 
T contains an infinite path starting at the root node. 

Lemma 7.8 (Dickson's Lemma (see, e.g., [14|, [9])). All non- dominating sequences of 
tuples of natural numbers are finite. 

These finiteness (or infiniteness) properties allow us to compute a bound on the number 
of stages of c-bounded runs: 

Lemma 7.9. There is a nondecreasing computable function / :NxNxffxN^N with 
the following property: 

For all m G N, c G N, t G N, r G N, ki,...,kf G {l,...,r} and G N ki with 
\\xi\\oo < m , each c-bounded run of (xi, xt) has at most f(m,c,t,r) stages. 
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Proof. First, we have a look at an arbitrary choice of m, c, t, r, ki,...,kt and x% (for 
i = l,...,t). 

We create a labeled tree T containing all c-bounded runs of these values: The root node 
is labeled with Xq. Inductively, for each node labeled with a stage Xi, we create a child 
node for each stage Xi + \ created from X- L and label it with the corresponding stage. 

To create a stage Xi + \ from Xi, we may choose each of the t sequences to extend it. 
Each sequence s*- has an arity kj and by the last condition of a c-bounded run, the element 
added to this sequence may only have coordinates that are at most cH^QHc^. Because of 
this and since the length of of the extension of the sequence in stage i + 1 is bounded 
from above by (| |A~o| |oo ■ c 3 ) c , there are only finitely many choices for a finite extension of 
a sequence and hence finitely many children for each node in this tree T (each for a different 
extension of some sequence). 

A path in T (starting at the root node) corresponds to one c-bounded run. We now 
show that each path is finite: Assume, we have an infinite path p in T. This path p is 
labeled with the stages of a c-bounded run X. In each stage one of the sequences of X is 
extended by finitely many elements and since there are only t sequences there has to be 
one sequence that is extended in infinitely many stages. Each extension of this sequence is 
non-dominating, so we get an infinite non-dominating sequence. But by Dickson's Lemma 
each non-dominating sequence of tuples of natural numbers is finite, a contradiction. Hence 
all paths in T are finite. 

Hence T has finite branching (only finitely many children to each node) and no infinite 
path. By Konig's Lemma T must be finite. 

The height of T is the greatest number of stages that can occur in a c-bounded run of 
xi, . . . , Xk- Since T is finite, we can compute the whole tree and determine its height. 

We discuss how to compute the value f(m, c, t, r) for given values m, c, t and r: 

For each fixed choice ki, . . . , kt of arities, the entries of all choices of the corresponding 
tuples x\, . . . ,x~t are bounded by m and thus for each tuple there are only finitely many 
choices. By computing the height of the tree (by creating the tree) to each choice of tuples 
one after the other and determining the maximum h{k\, ■ ■ ■ , kt), we have computed a bound 
on the number of stages for the c-bounded runs with sequence arities fci, . . . , kt- 

The parameter t determines the number of sequences in the runs considered and the 
parameter r limits the arities of these sequences. The maximum of the values h(k%, . . . ,kt) 
over all possible r sequence arity tuples (ki, . . . ,kt) is then the maximal number of stages 
in a c-bounded run with t sequences and sequence arities at most r and it can be computed 
by computing h(ki, . . . ,kt) for all finitely many choices. 

This maximum satisfies the properties of f(m,c,t,r), and that / is nondecreasing is 
immediate: Increasing some parameter, all runs remain valid, but also longer runs may 
appear. □ 

This function will directly lead to the function b of Theorem 17.21 

Proof, (of Theorem \ 7. 2fy The program n over a linear ordering A = (A, <) is first converted 
to an equivalent type-disjoint version n' as in Lemma 16. 2\ which also gives the bounds 
n \ < n i -3 m L , n' R < 3 m L'( m ^+ 1 ) -riji, m' R = ttir and m' L = tul for the parameters of the new 
program n'. Then the initialization sequence s as in Lemma [6.31 is determined, resulting in 
the first i s < n 1 < nj ■ 3 mL stages. 

While the empty relations of n' can be neglected, each nonempty relation P of n' 
has a type description @(P,i) with one rank vector each, and by Lemma 17.61 these rank 
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vectors satisfy the properties of an m^-bounded run X. By Lemma 17.91 X has at most 
f((m' R ) n R n i , m' R , n'j, m' L ) stages. 

We let b(n) := f(n 3 " , n, n-3 n , n) and by the above bounds on the parameters and since 
each program parameter is bounded from above by the program length n, 
f((m R ) n R n i,m R ,n' I ,m' L ) < /(n 3 ™ , n, n ■ 3 n , n) < b(n) and the claim follows. 

Since A was chosen as arbitrary linear order, this bound also holds for ucl(II, LO). Q 

It now follows easily that the datalog tuple problem is decidable on all linear orders, 
provided that the orders satisfy the effectivity conditions described in Section 12.31 which 
say that the elements of a structure are effectively represented, and that the distance-d type 
of a tuple can be computed. 

Theorem 7.10. The datalog tuple problem on linear orders is decidable. 

Proof. Using Lemma 15.61 and and Theorem 17.21 for each IDB we can compute the set of all 
complete types of tuples that are contained in an IDB-relation after the computation of the 
dalog program has been carried out. Then to decide whether a given tuple is contained in 
an IDB relation, we only have to check if it satisfies one of these types. □ 

The uniform closure ordinal of a datalog program can be also be used to decide the 
nonemptiness problem. By the EXPTIME-hardness of nonemptiness, it follows that the 
uniform closure ordinal of a datalog program over linear orders has to be at least singly 
exponential. We suspect that this lower bound is closer to the actual closure ordinal than 
our computable upper bound. 

On dense linear orders without endpoints, we can match the singly exponential lower 
bound. Let DLO denote the class of dense linear orders without endpoints. 

Theorem 7.11. Let U be a datalog program over the class of linear orders and n = |LT|. 
Then 

ucl(II, DLO) < 3 n \ 

Proof. Observe that on dense linear orders, distance types collapse to order types, because 
for all a, b and for all d > 1 we have a < b <J=^ a <d b. So the only types to consider are 
distance- 1-types (including equality atoms, when we consider complete distance- 1-types) 
and the as type-disjoint version of a program as introduced in Lemma 16.21 contains all 
complete distance-l-types, it contains all types of interest for this case. After evaluating the 
initialization sequence as computed in Lemma [531 all complete distance-l-types describing 
the IDB relations are computed and hence no rule can be applied after that to add new 
types. 

2 

Since there at most 3 n different distance-l-types for a program of length n, the claim 
follows. □ 

8. Relational structures with constants 

We may also consider datalog programs over linear orders A = (A, <, ci, . . . , c T ) with 
finitely many constants c%, . . . , c r , each of them being interpreted as a fixed element of A, 
which may be used in the datalog programs in question. 

To solve the datalog nonemptiness or tuple problem for such a program II with constant 
symbols, we transform the program to a constant free version II' by replacing each constant 
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Ci by a fresh variable and adding rule parts to transfer the values of all constants to all rules 
which are used during program execution. Using this technique, we show: 

Theorem 8.1. The datalog tuple problem on linear orders A = {A,<,c\, . . . ,c r ) with 
finitely many constants (which may be used by the datalog programs) is decidable. 

Proof. We first transform the program II over A = (A, <,ci, . . . ,c r ) to a program II' over 
the structure A' = (A, <) increasing the arity of each IDB symbol by r, such that for each 
IDB P of II with arity s and its corresponding IDB P' of II', and each a = (ai, . . . , a s ) € A s 
the following holds: 

aGP£ A ca = (c?,...,cf,a 1 ,...,a s )G(P')Z A ' (8.1) 

This is established by replacing all occurrences of the constant Cj by a fresh variable 
Ci, for i = 1, . . . ,r. To ensure, that all rule applications share the same values for the 
constants, we augment each IDB P of II by the additional variables C = (C\, . . . , C r ) and 
replace all occurrences of P{x) by P'{C, x) — in the rule bodies and the rule heads. This 
forces the values of the variables C\, . . . , C r to be identical in the body and head of each 
rule and hence in the whole pogram. For example, if 4>(ci, . . . , c r , x\, . . . , x m , yi, . . . , y n ) is 
the formula appearing in the body of the rule 

P {x\, ■ ■ • j x m ) < '/'(ci , • • • , c r , x\, . . . , x m , y\ , . . . , yn)- 

we translate this rule to: 

P'{Ci, ■ ■ ■ ,C r ,xi, . . . ,x m ) < <ft(Ci, . . . ,C r ,xi, . . . ,x m ,yi, . . . ,y n ). 

Now the original program II and the modified version II' satisfy condition (18. ip . 

This transformation can be carried out in logarithmic space, since the number r of 
constants does only depend on the structure A, not the input (II, P, a) of the tuple problem. 
The above construction is a logspace reduction from the tuple problem over linear orders 
with constants to the tuple problem over linear orders without constants, mapping the input 
(II, P, a) to the instance (II', P', ca), increasing tul and uir by r and the total program size 
by a linear factor. For this constant free version of the tuple problem, Theorem 17.101 shows 
us how to solve it, calculating the type sets introduced in Lemma 15.61 □ 

We can also use these type sets computed in the above proof for solving the nonempti- 
ness problem on A = (A, <, c\, . . . , c r ): 

Corollary 8.2. The datalog nonemptiness problem on linear orders A = {A, <, c\, . . . , c r ) 
with finitely many constants is decidable. 

Proof. On input (LT, P), we first calculate the type sets for the modified version D 7 , in which 
the constants have been replaced by the variables C\, . . . ,C r as above. Then we instantiate 
the variables C\, . . . ,C r by the constants of A in all types in Q(P' , oo) for the IDB P' of IT' 
corresponding to IDB P of IT, each C% by cf-. Types which are then no longer satisfiable are 
deleted from the set. If and only if there are satisfiable types remaining, P^' is nonempty. 

□ 
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An intuitive argument, why the upper bound for the datalog nonemptiness problem 
on orders with constants is much higher than the complexity of the case without constants 
is, that as soon as constants are involved, the solution process has to consider distances 
between constants. A solution may for some two constants c% < c,- require a number of 
elements to be present in A between a and c,-, which is higher than the actual number 
of elements between q and Cj in A. This case is only handled correctly by using distance 
types, order types alone do not suffice. 

However, on dense linear orderings we may do better and match the EXPTIME lower 
bound: 

Corollary 8.3. The datalog nonemptiness problem on dense linear orders 
A = (A, <, ci, . . . , c r ) with finitely many constants can be decided in exponential time. 

Proof. This result is a straightforward combination of the preceeding proof and Theo- 
rem mu □ 

Note that even though we use the decidability of the datalog tuple problem to prove 
the decidability of the datalog nonemptiness problem for linear orders with constants, we 
do not need to make any effectivity assumptions on the linear order A (cf. Sec. 12. 3p here. 
The reason is that the constants are fixed in advance as part of the structure and thus all 
information about them can be hardwired into the algorithm. 



9. Concluding remarks 

We studied the complexity of datalog on linear orders. We precisely determined the 
complexity of the datalog nonemptiness problem: It is EXPTIME-complete on all linear 
orders with at least two elements. We also obtained a computable uniform upper bound on 
the closure ordinal of datalog programs on linear orders. Then best lower bound we know 
for the uniform closure ordinals is singly exponential. 

The upper bound on the closure ordinals can be used to prove that the datalog tuple 
problem is decidable on computable orders leading to the same complexity bound for the 
nonemptiness and tuple problems on orders with constants. Based on these results, an 
implementation of the distance type concept for calculations seems feasible for applications, 
e.g. in temporal and spatial reasoning. 

In his forthcoming PhD-thesis [22] , the second author showed that most of the results 
obtained here can be extended to colored linear orders, that is, linear orders with additional 
unary predicates, and, at least partially, to colored trees, where trees are viewed as partial 
orders. 
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